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Abstract 

We show the connection between the Walsh spectrum of the output of 
a binary random number generator (RNG) and the bias of individual bits, 
and use this to show how previously known bounds on the performance 
of linear binary codes as entropy extractors can be derived by considering 
generator matrices as a selector of a subset of that spectrum. We explicitly 
show the connection with the code’s weight distribution, then extend this 
framework to the case of non-binary finite fields by the Fourier transform. 


1 Introduction 

Our objective is to obtain sharp bounds for a finite number of iterations 
of a conditioning procedure applied to the output of an entropy source, 
rather than results relying on an asymptotic convergence or ones based 
on randomly choosing a conditioning function from among a large class, 
e.g. universal hash functions. We follow the recommendations set out by 
NIST [BK12] for the precise meaning to be given to these terms. This is 
closely related to the subject of randomness extractors, as summarised for 
instance in [Shall]. We consider linear transformations applied to sources 
of entropy producing independent output; bounds on the distance from 
the uniform distribution of such sequences have been shown in [LacOS; 
Lac09; ZBll], where the entropy source was assumed to produce biased 
independent bits, and the conditioning function was the generator matrix 
of a binary linear code. 

We are especially interested in random variables that, in addition to 
satisfying said constraints, are discrete - in the sense that the number of 
possible outcomes is finite and the variables admit a discrete probability 
mass function fj,x{j) = P(A = Xj)', moreover, we begin by considering 
these variables to take values in a finite field Fp or a vector space (Fp)*’, 
with particular regard to the special case of binary variables, p = 2. 

In Section 2 we show the connection between the Walsh spectrum of 
the output of a binary random number generator (RNG) and the bias of 
individual bits, and use this in Section 3 to show how previously known 
bounds on the performance of linear binary codes as RNG post-processing 
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functions can be derived as a special case by considering generator matri¬ 
ces as a selector of a subset of that spectrum, explicitly showing the con¬ 
nection with the code’s weight distribution. We then extend this frame¬ 
work to the case of output in non-binary finite fields by nse of the Fonrier 
transform in Section 5. 

2 Total variation distance and the Walsh- 
Hadamard transform 

We show in the following one way in which the Walsh-Hadamard trans¬ 
form may be nsed to bound the total variation distance of binary random 
variables with a known probability mass function. This may seem an un¬ 
necessary exercise since the TVD can simply be compnted exactly from 
this knowledge, bnt aside from revealing some interesting structure to the 
calculation it will become more explicitly useful in the following section. 
Consider a random vector Y £ (F 2 )*’ with probability mass function 

, 

/rv(j)=P(y =j) 


where in writing j and j we use the binary representation of integers as 
vectors a £ Z 2 I 0 


k-i 'j 

= I £ (F2)\ 

j=o J 


The o-th order Walsh function evaluated at b is 


Kib) = (-1)-^’ (1) 

with • the dot product on (F 2 )*’. The Walsh characteristic function of 
Y as defined in [Pea71] is 

2^-1 

Xj{y)=^hj{a)nY{a) (2) 

a=0 

= E[h,.(y)] (3) 


Note that the dot product of two binary vectors b ■ v is the bitwise sum, 
i.e. the linear combination, of those elements v(i) that correspond to the 
non-zero entries of b; therefore, the random variable 

h,{Y) = 


will take value —1 if the linear combination of the selected elements of Y 
is equal to one, and 1 otherwise. The sum of the selected elements is itself 
a random variable, B = b • y following the Bernoulli distribution with 
probability Pb( 1) of being equal to 1; it follows that 

hbiY) = 1-2B, 
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and hence we can conclnde that 


x6(y) = E[h6(y)] 

= l-2/is(l). 

We note now that the bias of a Bernonlli random variable -B G F 2 is 
commonly defined as 

^ = i|P(B = l)-P(B = 0)| 

= i |2P(B = 1) - 1| 

= i |2E[B] - 1|, 

and we observe that the Walsh characteristic of b • y leads to the bias of 
the 6*^ linear combination of the elements of Y via the relation 

|X6(y)l = Eb.V . (4) 

In particular, the combinations corresponding to exact powers of two, 
b — 2^, lead to the bias of each individual element of Y ; and the zeroth 
Walsh characteristic, corresponding to 6 = 0, will be equal to 1 in all 
entries, in all cases. 

The set {hi} is known to correspond to the rows of a Hadamard 
matrix H of size 2^ ; the set of all Walsh characteristics of Y can thus be 
written compactly in matrix notation as 

X{Y) = HfiY . (5) 

As a matter of notation, for a uniformly distributed random variable U G 
(F 2 )^ we have 

X(C/)=l2^(-,l) 

with 1 a column vector of ones and (', 1) the first column of the identity 
matrix of size 2^ . We may use this to estimate the total variation distance 
of y from uniform as follows. 

Theorem 1. The total variation distance of a random vector Y G (F 2 )*^ 
from uniform U, 

TY^{YM) = \\\TY-Tuh 



is bounded by the sum of the bias of all non-trivial linear combinations of 
the output bits, 

5y < ^ Sb-Y ■ 

be(F2)'=\0 
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Proof. 


\\^J■Y - Mulla = 


H^H 


2k 


{fiY - Pu) 


1 

^2 




< k(y)-x(t/)lli 

= ^ £b.y 

be(F2)'“\o 


( 6 ) 

(7) 

( 8 ) 
(9) 


Here is the transpose of the Hadamard matrix H. Equation (6) follows 
from Pl^/sP^ being the unitary inverse Hadamard transform; Equation 
(7) uses the definition of x{Y) in Equation (5); and lastly, the bound in 
Equation (8) stems from the £i bound on g-dimensional vector spaces, 

II • 111 < • l|2. □ 

Corollary 1. If the bits Y{j) £ F 2 are i.i.d. with known bias Sy, then 


SY<Y.Aie 
1^1 
k 

= E 

1=1 

where Ai is the number o/b £ (F 2 )*^ with Hamming weight u>(b) = 1. 


3 W-H bound on binary generator ma¬ 
trices as extractors 


We now consider the previous bound as applied to random variables Y = 
GX, with X £ (F 2 )'* a sequence of n Bernoulli random variables with 
known probability mass P(X = 6) = gLx{b) and identical bias = |1 — 
2¥{X{i) = 1)1 for each bit, and G £ (F 2 )*’^" the generator matrix of an 
(n, k, d) linear code C with weight distribution {Ai}-, in other words, G is 
a subspace of (F 2 )" and Ai is the number of c £ C with Hamming weight 
ui(c) = 1. 

The definition of Walsh characteristic functions as expected values in 
Equation (3) directly leads to 

Xb{GX) =E[hs{GX)] 

2"-l 

x=0 

We note here that the dot product in the Walsh function can equivalently 
be expressed using the transpose b^ as 

b•Gx = b^Gx, 
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and in particular the product 


= h^G 

is a linear combination of the rows of G: since G is the generator matrix 
of a linear code C the rows of G form a basis of C, and hence any linear 
combination of them is again a word c £ C. Consequently, just as the 
Walsh characteristic led to a measure of bias in Equation (4), we can 
conclude that 


\Xb{Y)\=e..x. (10) 

In other words, the bias of the 6*^ element of Y is equal to the bias of 
a linear combination of w{c) bits of X (compare to Equation (4)). This 
leads directly to the following bound. 

Theorem 2. Let Y = GX, with X £ (F 2 )" a sequence of n independent 
but not necessarily identically distributed Bernoulli random variables, and 
G £ {¥ 2 )'^^" the generator matrix of an (n,k,d) linear code G. The total 
variation distance of the random variable Y £ (F 2 )*’ from uniform, 

TVD(y,W) = ^ 

is bounded by the sum of the bias of all linear combinations of the bits in 
X defined by the codewords of G, in the following measure: 

5y < E ShTax 

be(F2)'=\o 

= 

cec\o 

Corollary 2. If the bits X{j) G F 2 are i.i.d. with known bias Sx, then 

n 

Sy < . 

l=d 

with {Ai} the weight distribution of C. 

Note that Corollary 1 is closely related to Corollary 2 if we consider 
that in this context the {Ai} in the former correspond precisely to the 
weight distribution of the trivial code given by the message space itself, 
(F 2 )*’. A particular case of Corollary 2 for strictly binomial {Ai} was 
proved in [ZBll], Theorem 6. We can thus recover the following known 
bound (see [Lac08], Theorem 1): 

Corollary 3. Considering only the minimum distance d rather than the 
full weight distribution, 

Sy < ( 2 '“ - l)ei . 
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4 Total variation distance and the Fourier 
transform 


The Hadamard transform is a special case of the Fourier transform con¬ 
structed with primitive 2-nd root of unity p = 2, cup = — 1, and the 
Hadamard matrix of size 2^ is constructed by the Kronecker product 
H 2 k = H 2 ® H 2 k-i, so the binary case considered in Section 3 can be 
seen as a special case. Employing the Fourier transform is natural in this 
setting and closely follows well-established techniques for the sum of con¬ 
tinuous random variables, which have their own convolution theorem and 
proofs of convergence to a limiting distribution. 

Given an integer a € Zpk, we denote its p-ary representation by 


a = 



a = 


k-l 

j=0 


e (Fp)'= . 


We shall use this notation interchangeably in the following as a natural 
indexing of the elements of (Fp)*’. 

Consider a random variable Z £¥p with probability mass function 


= F(^ = j) • 


Note that this implicitly assumes an ordering of the mass function pz 
corresponding to the representation of elements f5j G Fp as integers. The 
discrete Fourier transform of pz may then be written in matrix form as 


Fpfiz = Az , 


where A is the set of eigenvalues of the circulant matrix generated by pz. 
Indeed, the above can be restated in terms of the unitary DFT, 


Fp = 


3l 

Vp 


P* 

Vp 


with F* the conjugate transpose of Fp, diagonalising the circulant matrix 
Cz generated by pz: 


Cz = circ{fiz) 

FpCzFp = Az 

with Az the p x p diagonal matrix containing all eigenvalues of Cz ■ Note 
that for a uniformly distributed random variable ?7 € Fp, we have 

1 

P(/ = — 

P 

Xu Fpfiu Ip(', 1) 

where 1 is a vector of ones of length p, and the only non-zero eigenvalue 
is the zeroth one, so the full set Xu corresponds to the first column of the 
identity matrix of size p. 
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Lemma 1. The mass function fiz of a random variable Z GWp satisfies 
ll/^Z - fj^u\\2 = 

y/P 

where \z = Fp^z is the discrete Fourier transform of Z. 

Proof. 

p* 


IImz - MC/II2 = 


Vp 


(Az — Xu) 


= \\Xz - Ac/lla 


1/2 


( 11 ) 

( 12 ) 

(13) 


Equation (12) follows from the unitary Fourier transform preserving £2 
distance. □ 


We can obtain a first, crude bound on the TVD by considering the 
largest non-trivial eigenvalue, defined as follows for future reference. 

Definition 1. Given a random variable Z G ¥p with mass function fiz 
and eigenvalues FpHz = Xz, denote the greatest non-trivial eigenvalue by 

Az, = max |Az(i)| 

Theorem 3. The total variation distance of a random variable Z £ ¥p 
from uniform, 

TVD(Z,W) = ^ 

is bounded by 

5z <(p— 1)^^^ Az* 
with Az* as in Definition 1. 

Proof. This follows from considering the worst-case scenario in which all 
eigenvalues Az in Lemma 1, except the zeroth eigenvalue Az(0) = 1, are 
equal to the greatest non-trivial eigenvalue Az* by applying the bound on 
p-dimensional vector spaces ||2;||i < p^^^|| 2 :|| 2 . 

□ 


We can now consider how this affects the distribution of a sum of two 
variables, S 2 = Xq + Xi £ Fp, which is the discrete convolution of the 
two probability masses, 

P(S 2 = r)^J2 >^(^0 = J)^(^i = r-j) 

p-1 

= (’' “ (-i) 

2=0 
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The distribution of S 2 may then be expressed in matrix notation as 

MS2=C'ziMZoi (14) 

where Czi is the circulant matrix uniquely defined by fizi- In other 
words, the entry r, j of Czi contains the measure under Z\ of the element 
fir — Pj € Fp , which we denote in matrix form by 

Czi = fiZiiB), 

B{r,j) = Pr- Pj 

Considering the particular case of summing two identical variables Z 
with probability mass function fiz, the distribution of S 2 = ^ ^ may 

be written as 


S 2 ~ CzfJ-z, 


where Cz is the circulant matrix defined uniquely hy jiz itself. By induc¬ 
tion, 


S 


n 


a 


z 

n — 1 rn* 


n I 


- p A n — 1 T-t 

—Fpjiz 


Vp 


Vp 


p '^Z 


(15) 


As well as being conceptually equivalent to using the convolution theorem, 
this may also be seen as considering Sn as a Markov chain 


So = Z 

Sj = Sj-i + z 


with transition matrix Cz- 

Lemma 1 may be extended as follows. 

Lemma 2. The probability mass function p-Sn of a random variable Sn = 
Z, Z G Fp satisfies 


||ms„ - MC/II2 



An)" II2 


Proof. The proof is substantially the same as that of Lemma 1, using 
Equation (15). 

□ 


Lemma 3. The total variation distance of Sn = Z, Z G Fp from 

uniform may be bounded by 

fen < (p - 1)^^^ Az, (16) 

with \z* as in Definition 1. 
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Proof. The proof follows by applying Lemma 2 in the same way as Lemma 
1 was applied to Theorem 3, i.e. assuming each of the p — 1 eigenvalues in 
Lemma 2 that are of magnitude less than 1 to be bounded by A*, using 
Equation (15). 

□ 

Lemma 3 is a slight improvement on a known bound on the conver¬ 
gence rates of Markov chains on Abelian groups; see e.g. [Ros95] Fact 

7. 

We have so far assumed an ordering of the mass function px of a 
random variable A € Fp according to the representation of the elements 
of Fp as integers. Similarly for vector spaces X € (Fp)*^ we may assume an 
ordering of px by least significant digit. Generalising to the distribution 
of the sum S2 of two random variables Xo,Xi € (Fp)*^, this may still be 
expressed in a form such as Equation (14), but the matrix Cxi is a level 
k block circulant with circulant base blocks of size {p x p). Concretely, 
while considering all coefficients of B as elements of (Fp)*’, we may write 

Bp = circ{[ 0 , 1,... p — 1]) 

Bp'^ = circ{Bp, p + Bp, ... (p - l)p -I- Bp) 

Bp =circ[Bp ,p -\-Bp ,...(p-l)p -\-Bp ) 

with the circ function defined column-wise following [Dav94], and Bp* 
used as short-hand to indicate a matrix therein defined as belonging to 
the class BCCB{p,p,.. .p), k times. 

As shown in [Dav94], matrices with this structure are diagonalised 
by Fp*. This naturally extends the known structure for binary random 
variables, since as discussed in Section 2 the convolution matrix for vari¬ 
ables in (F2)* is diagonalised by the Hadamard matrix H2k, which by 
construction is equal to B®*. 

The Fourier matrix of size p can be written as a Vandermonde matrix 
of a primitive p-th root of unity as 



( 4 '° 


LOp \ 

II 


Wp* 

. . “SO 
6 


0 

• 7 

Ao, 

3 

1 

3 


In other words, the entry in row 

r, column s is 



Fp{r,s) = ojp'’, r,s€Zp 

By definition of the Kronecker product of two (p x p) matrices, 

K = Ml® M 2 

K(u, v) = Mi(ri, si)M 2 (r 2 , S 2 ) u, v, ri, Si G Zp 

u = rip + r2 (17) 

u = sip-|-S 2 (18) 

iFp®Fp)iu,v)=io;^^^uj;^^^ 
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which extends to the fc-fold Kronecker product by induction using the 
p-ary representation of integers 

for the specific r, s satisfying a polynomial in p such as (17) and (18) of 
degree k — 1. In general, keeping either the row or column index fixed 
and iterating over the other means iterating over every element of (Fp)*’; 

k 

concretely, when evaluating the eigenvalues of a probability mass p G W’ , 
the 6-th eigenvalue corresponds to 

p''-! 

3=0 

p’^-l 

= ■ (19) 

3=0 

Generalising from the case of the Walsh transform, this suggests the def¬ 
inition of the a-th order Fourier function evaluated at 6 as 

Mb) =03^-^ 

(compare to Equation (1)), so that if E G (Fp)*’ is the random variable 
with mass function p, the eigenvalues may be written as 

Ay(6)=E[/5(y)l . (20) 


(compare to Equation (3)). 

Lemma 4. The probability mass function py of a random variable Y € 
(Fp)*^ satisfies 


Proof. 


Wpy - pu \\2 


^||Ay-A(7||2 


Wpy - pu\\2 


^ (mv - Mt/) 
^ ||Ay - Ac/IIj 


( 21 ) 

( 22 ) 

□ 


Corollary 4. If the elements Y(j) G Fp are independent but not neces¬ 
sarily identically distributed, 


llftv - p.u\\2 


1 

pkl2 


k-1 

(^ Ayy) - An 


3 = 0 


2 
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Proof. Since the X{j) are independent, the probability mass function of 
y £ (Fp)*^ is 


fiY — ^i'(o) ® Mr(i) ® • • • A^r(fc-i) 
fc-i 

= PY(i) , 

3^0 

and the eigenvalues will be 

Ay = 

k-l 

= P'ppYU) 

j=0 

k-l 

= Ayy) 

3=0 

where the second step follows by the mixed-product property of the Kro- 
necker product. 

□ 


We can now extend Theorem 1 as follows. 

Theorem 4. The total variation distance of a random vector Y G (Fp)*^ 
from uniform, 


is bounded by 


TVD(y,W) 


1,, 

- Il/ry - pu 



1 


5y< ^ 

be(Fp)fc\o 


fc-i 

n ^y(b(M)) 

u=0 


(23) 


Proof. Each eigenvalue may be written as 


k-l 

>^Y{b) = Ay(b(u)). 

u=0 


The result follows directly from Lemma 4 and the known bound on 
vector spaces. 


□ 


We can also extend Corollary 1 to establish a connection with the 
number of vectors of a specific Hamming weight, but in the non-binary 
case we can also go into more detail if the full composition of each vector 
in the space is known, as in the following definition. 
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Definition 2. Let s(b) be the composition of b G (Fp)*^ such that Sj(b) 
is the number of components of b equal to j. 

s(b) = (so,si,.. .Sp_i) 

Sj = l{i I b(i) =j}\ 

Let iL(Fp)fc(f) be the enumerator of the elements b having composition 
equal to t, with t being a p-tuple summing to k: 

H/(F^).(t) = |{bG(Fp)'=|s(b) = t}| 
t € T C 

j 

then the number of b with Hamming weight equal to I is 
Ai = Y, W{t) te{to = k-i}. 

t 

In particular, if instead of b G (Fp)*’ we consider a set of codewords c G C, 
the enumerator Wc is the complete weight enumerator of C, and Ai its 
weight distribution, as defined in [MS77] ch. 5§6. 

Corollary 5. If each Y{j) is i.i.d., the total variation distance of a ran¬ 
dom vector Y G (Fp)^ from uniform is bounded by 

p-i 

Sy < (^yO) (“))*“ t£{to<k} (24) 

t u=0 

Without knowledge of the full spectrum of Y{j) one may obtain a 
coarser bound using the second largest eigenvalue is Ay*, as in Defini¬ 
tion 1: 


k 

Sy<J2Ai^'y,, (25) 

1^1 

where Ai is the number o/b G (Fp)*^ with Hamming weight w(b) = 1. 

Proof. Each eigenvalue may further be written as 

fc-i 

^vib) = tt ^y(b(M)) 

ti=0 

= ng^p^W>y(„)(u), 

ti=0 v=0 

so all the b with identical composition t will correspond to equal eigenval¬ 
ues, leading directly to Equation (24). If the Hamming weight ^(b)^)) = 
0, then the u-th term of the product will be equal to 1; Equation (25) 
follows by considering the worst case Ay(j) = Ay* Vj > 0. 

□ 
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5 Fourier bound on entropy extractors 

In order to arrive at a bound involving the distribution of weights, we 
begin by showing there is an unique association between code words and 
eigenvalues, just as there was with the bias of individual bits in the binary 
case (see Theorem 2). 

IfY — GX, with X a random vector in (Fp)", G a generator matrix of 
an (n, k, d) code over Fp, we can establish a direct link between eigenvalues 
of Y and codewords of G using Equation (20): 

Av(6) =E[/6(GX)] 

j=0 

p"-l 

= (26) 
3=0 

with c = b^G a particular word of the code. Note that choosing a partic¬ 
ular {k X n) matrix G is eqivalent to selecting the specihc rows specified 
by all the k codewords c that forms a subset of the p’^ rows of by 
which to multiply px- 

Having noted this fundamental link in principle in the same manner as 
for the binary case (see Equation (10)), and having developed the required 
tools in Section 4, we can immediately state some more specific results 
for particular cases of practical interest, beginning with an extension of 
Theorem 4. 

Theorem 5. Let Y = GX, where X G (Fp)" is a random vector of length 
n, with each entry being an independent but not necessarily identically 
distributed variable X{j) € Fp with mass function Px(j) £ ^.^d G is 

the generator matrix of an {n,k,d) linear code over Fp. Then the b-th 
eigenvalue of the distribution of Y is 

n — 1 

Av(fo)= nAxo)(c(j)) (27) 

3=0 

where c{j) G Fp is the j-th symbol in the codeword — h^G. 

Proof. The specific combination corresponding to a word c is from Equa¬ 
tion (26): 


p"-i 

>^Y{b) = Y ^P^Lxu) 

3=0 

n—1p—1 

u=0v=0 

□ 
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Corollary 6. If all X{j) are also i.i.d., the total variation distance of a 
random vector Y G from uniform is bounded by 

p-i 

tG{to<n} (28) 

t u=0 

Without knowledge of the full spectrum of X(j) one may obtain a 
coarser bound using the second largest eigenvalue is Xx*, as in Defini¬ 
tion 1: 


5y<Y,AiX‘x., (29) 

l=d 

Here Wc and Ai are the complete weight enumerator and weight distribu¬ 
tion of C, respectively, as in Definition 2. 

Proof. The proof follows in the same manner as for Corollary 5. 

□ 

The above can be viewed as a statement regarding the sum of n random 
variables, each in Fp: if only w{c) symbols are non-zero, this corresponds 
to a sum of w{c) terms. 

Corollary 7. Using the minimum distance d, one may obtain the bound 

Sy < {p'° - l)Ax* • 

Note that all the results in this section extend to random vectors X G 
(Fpm)", that is to sequences of random vectors taking values in Fpm by 
using the right matrix to diagonalise the convolution matrix of the sum 
of two such variables in order to compute the eigenvalues, and assuming 
the symbols of the generator matrix are taken in the same field, i.e. the 
code is chosen over Fpm. Following Section 4, this may be done using the 
Kronecker product F®™"'. 

Comparing Corollaries 3 and 7, it appears that a bound based solely 
on the minimum distance quickly risks becoming far from sharp as the 
dimension of the underlying random variable X{j) increases. 


6 Non-linear codes 

As shown in [Lac08], it is possible to construct ad-hoc non-linear maps 
with better properties than linear ones for specific cases; it was also noted 
that for a given compression ratio k/n of the output, there may exist 
non-linear codes with a greater minimum distance than any linear code. 
Since non-linear codes do not have a generator matrix G they are not 
straightforward to cover using the tools developed thus far, but we may 
use some of them to frame the fundamental issue with non-linear maps, as 
we see it, in terms of examining the distribution of the product of random 
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variables. Consider the special case Xi, Xq € F 2 , and let their product be 
P 2 ; its mass function may be written as follows: 


P 2 = XiXo 


MP 2 = 


/rxi(0)\ 


\t^XoW) 


As long as neither Xq or Xi follow the categorical distribution with P(l) = 
1, the probability of their product being zero is strictly greater than either 
of the initial probabilities. By induction, 


lim fip 

j-»oo 



We may conclude that, while increasing the number of linear operations 
will lead to the uniform distribution in expectation, increasing the number 
of non-linear operations will in general lead to a worsening of the output 
distribution, except in very specific cases. By way of example, consider 
the two Bernoulli variables 

/1_2-i/2x / 2-1/2 X 

2-1/2 j 2-1/2 j ■ 


Note that the bias of these two random variables is identical; however, 
the distributions of their products are quite different: 

/ 2 -i\ /2I/2 - i 

f‘S+B+ = ^2-1 j fJ‘B_B_ = \^3 _ 2I/2 

While it is possible to find non-linear maps that are optimal in some spe¬ 
cific cases, we observe that not only does repeated processing by nonlinear 
maps in general lead to a worsening of the output, but it is also necessary 
to know or assume a specific distribution of the sequence to be processed 
to even attempt to Hud such a processing; even under the assumption of 
i.i.d. binary variables, knowledge of the bias of each bit is not sufficient. 


7 Conclusions 

We have shown new bounds on the statistical distance from the uniform 
distribution of random number sequences conditioned by linear transfor¬ 
mations chosen from the generator matrices of linear codes, based on the 
assumption of independent generator output in Fpm; we have also shown 
how these bounds are natural generalisations of known bounds in ¥ 2 ^ once 
the structure behind the known bounds is made clear. When appropriate, 
these bounds allow the practitioner both to assess the performance of a 
chosen matrix as well as to make an informed choice from a pre-existing set 
with well-defined properties. This would seem particularly advantageous 
with respect to randomly choosing boolean matrices as conditioners and 
relying on the leftover hash lemma [fLL89] to conclude sufficiently good 
output will be produced in expectation. 
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